Notes on data formatting from Deborah Wentworth:
Notes on duration of symptoms: Currently using symdur here, because I want to do this analysis on the continuous duration of symptoms. But I’m waiting for an email from Deb clarifying whether or not this variable represents total duration of symptoms or just duration prior to enrollment.
##
## Spearman's rank correlation rho
##
## data: valid$age and valid$symdur
## S = 66961000, p-value = 0.05384
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.07015608
There is a trend toward a very weak positive correlation here, but looking at the data shows that it’s not very strong.
##
## Spearman's rank correlation rho
##
## data: valid$age and valid$symdur
## S = 63158000, p-value = 1.29e-05
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.1568748
Results suggest a weak, positive correlation, although this may be influenced by outliers.
H1N1
H3N2
H1N1
No clear effect of season.
H3N2
No clear effect of season –> Try models that don’t include season and country as blocking vars
We know the relationship between age and probability of infection should be non-linear. Use cross-validation to verify, and to choose the number of degrees of freedom to include in a spline
Plot the overall test error rate vs. df
Clear preference for 2 df
We know the relationship between age and probability of infection should be non-linear. Use cross-validation to verify, and to choose the number of degrees of freedom to include in a spline
Plot the overall test error rate vs. df
Conclusion: Clear preference for linear fit for H3N2
## integer(0)
## integer(0)
## integer(0)
## integer(0)
## integer(0)
## integer(0)
## integer(0)
##
## Argentina Australia Austria Belgium Chile China Denmark
## 50 69 3 11 1 15 68
## Germany Greece Norway Peru Poland Singapore Spain
## 41 83 7 16 6 7 40
## Thailand UK USA
## 64 127 110
## [1] 718
Treat vaccination, antivial use, underlying symptoms, country and season as blocking variables with fixed effects
library(splines)
## H1N1 fit to medical history only (no effect of country, season, age or imprinting)
fit00 = glm(symdur ~ anyvac + anyav + anydx, data = train.H1N1)
summary(fit00)
##
## Call:
## glm(formula = symdur ~ anyvac + anyav + anydx, data = train.H1N1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -6.224 -2.967 -1.224 1.301 42.509
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.4424 0.5525 9.851 <2e-16 ***
## anyvac1 0.2678 0.4503 0.595 0.552
## anyav1 0.2566 0.5228 0.491 0.624
## anydx1 0.5246 0.4483 1.170 0.242
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 24.04477)
##
## Null deviance: 15523 on 646 degrees of freedom
## Residual deviance: 15461 on 643 degrees of freedom
## AIC: 3899.5
##
## Number of Fisher Scoring iterations: 2
## H1N1 fit to country, season and medical history only (no effect of age or imprinting)
fit0 = glm(symdur ~ anyvac + anyav + anydx + country + season, data = train.H1N1)
summary(fit0)
##
## Call:
## glm(formula = symdur ~ anyvac + anyav + anydx + country + season,
## data = train.H1N1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -7.054 -2.854 -1.114 1.535 38.363
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.2162 1.4141 4.396 1.3e-05 ***
## anyvac1 -0.2739 0.4812 -0.569 0.56940
## anyav1 0.6877 0.5410 1.271 0.20416
## anydx1 0.1985 0.4645 0.427 0.66923
## countryAustralia 0.4665 1.2092 0.386 0.69978
## countryBelgium 0.6131 2.0853 0.294 0.76884
## countryDenmark -0.7396 1.3440 -0.550 0.58233
## countryGermany -0.5488 1.4602 -0.376 0.70714
## countryGreece -2.0609 1.3171 -1.565 0.11816
## countryOther -1.6289 1.3276 -1.227 0.22030
## countrySpain -0.3979 1.4426 -0.276 0.78278
## countryThailand -3.4437 1.1906 -2.892 0.00396 **
## countryUK -0.3995 1.2637 -0.316 0.75198
## countryUSA -0.8314 1.2854 -0.647 0.51800
## seasonNH.10.11 0.6383 0.7514 0.849 0.39600
## seasonNH.11.12 0.6758 1.9688 0.343 0.73155
## seasonNH.12.13 0.8007 1.5247 0.525 0.59964
## seasonNH.13.14 0.7895 0.7756 1.018 0.30910
## seasonNH.14.15 1.5225 1.5045 1.012 0.31193
## seasonNH.15.16 0.3512 0.6779 0.518 0.60462
## seasonNH.16.17 -0.3809 1.7783 -0.214 0.83047
## seasonSH.10 -0.1925 1.5268 -0.126 0.89970
## seasonSH.11 -1.7149 1.5618 -1.098 0.27260
## seasonSH.12 0.8042 2.3103 0.348 0.72788
## seasonSH.13 0.7138 1.4782 0.483 0.62937
## seasonSH.14 3.3423 1.8262 1.830 0.06771 .
## seasonSH.15 -1.7138 1.9668 -0.871 0.38391
## seasonSH.16 -1.1813 1.0953 -1.078 0.28124
## seasonSH.17 0.8921 2.3010 0.388 0.69838
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 23.56294)
##
## Null deviance: 15523 on 646 degrees of freedom
## Residual deviance: 14562 on 618 degrees of freedom
## AIC: 3910.7
##
## Number of Fisher Scoring iterations: 2
## H1N1 fit to age only, plus fixed effects of vaccination, av use, underlying conditions, country and season
fit1 = glm(symdur ~ ns(age, knots = 65) + anyvac + anyav + anydx + country + season, data = train.H1N1)
summary(fit1)
##
## Call:
## glm(formula = symdur ~ ns(age, knots = 65) + anyvac + anyav +
## anydx + country + season, data = train.H1N1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -7.477 -2.842 -1.052 1.561 38.175
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.72824 1.50040 3.151 0.00170 **
## ns(age, knots = 65)1 3.69432 1.42018 2.601 0.00951 **
## ns(age, knots = 65)2 -1.09713 1.19831 -0.916 0.36025
## anyvac1 -0.32593 0.49096 -0.664 0.50702
## anyav1 0.64715 0.53848 1.202 0.22990
## anydx1 -0.09236 0.48236 -0.191 0.84821
## countryAustralia 0.42180 1.21037 0.348 0.72759
## countryBelgium 0.71160 2.08309 0.342 0.73276
## countryDenmark -0.64419 1.34493 -0.479 0.63213
## countryGermany -0.18651 1.46318 -0.127 0.89861
## countryGreece -2.16562 1.31107 -1.652 0.09908 .
## countryOther -1.64659 1.32349 -1.244 0.21392
## countrySpain -0.47702 1.43761 -0.332 0.74014
## countryThailand -3.34321 1.18597 -2.819 0.00497 **
## countryUK -0.25376 1.26653 -0.200 0.84127
## countryUSA -0.82410 1.28460 -0.642 0.52142
## seasonNH.10.11 0.55064 0.75111 0.733 0.46378
## seasonNH.11.12 1.25181 1.96937 0.636 0.52525
## seasonNH.12.13 0.46299 1.52508 0.304 0.76155
## seasonNH.13.14 0.68947 0.77270 0.892 0.37259
## seasonNH.14.15 1.55074 1.51573 1.023 0.30666
## seasonNH.15.16 0.21181 0.68527 0.309 0.75736
## seasonNH.16.17 -0.39135 1.77006 -0.221 0.82509
## seasonSH.10 0.13957 1.52445 0.092 0.92708
## seasonSH.11 -1.42473 1.55733 -0.915 0.36063
## seasonSH.12 1.41702 2.30926 0.614 0.53969
## seasonSH.13 1.08378 1.47666 0.734 0.46326
## seasonSH.14 3.37194 1.82386 1.849 0.06497 .
## seasonSH.15 -2.08363 1.96142 -1.062 0.28851
## seasonSH.16 -1.19781 1.09389 -1.095 0.27394
## seasonSH.17 0.53899 2.29319 0.235 0.81426
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 23.32775)
##
## Null deviance: 15523 on 646 degrees of freedom
## Residual deviance: 14370 on 616 degrees of freedom
## AIC: 3906.2
##
## Number of Fisher Scoring iterations: 2
## H1N1 fit to age plus imprinting, plus fixed effects of vaccination, av use, underlying conditions, country and season
fit2 = glm(symdur ~ ns(age, knots = 65) + p.g1.protection + anyvac + anyav + anydx + country + season, data = train.H1N1)
summary(fit2)
##
## Call:
## glm(formula = symdur ~ ns(age, knots = 65) + p.g1.protection +
## anyvac + anyav + anydx + country + season, data = train.H1N1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -7.374 -2.791 -1.075 1.618 38.245
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.70242 1.50078 3.133 0.00181 **
## ns(age, knots = 65)1 5.05757 2.02296 2.500 0.01268 *
## ns(age, knots = 65)2 -0.46876 1.37006 -0.342 0.73236
## p.g1.protection -0.74556 0.78783 -0.946 0.34434
## anyvac1 -0.32310 0.49101 -0.658 0.51077
## anyav1 0.61757 0.53943 1.145 0.25271
## anydx1 -0.07143 0.48290 -0.148 0.88246
## countryAustralia 0.45781 1.21107 0.378 0.70554
## countryBelgium 0.82620 2.08679 0.396 0.69230
## countryDenmark -0.61053 1.34552 -0.454 0.65017
## countryGermany -0.14568 1.46394 -0.100 0.92076
## countryGreece -2.08020 1.31428 -1.583 0.11399
## countryOther -1.57598 1.32570 -1.189 0.23498
## countrySpain -0.47964 1.43773 -0.334 0.73879
## countryThailand -3.31511 1.18644 -2.794 0.00537 **
## countryUK -0.16912 1.26980 -0.133 0.89409
## countryUSA -0.73398 1.28823 -0.570 0.56905
## seasonNH.10.11 0.53954 0.75127 0.718 0.47292
## seasonNH.11.12 1.35576 1.97260 0.687 0.49215
## seasonNH.12.13 0.38859 1.52724 0.254 0.79924
## seasonNH.13.14 0.61820 0.77643 0.796 0.42622
## seasonNH.14.15 1.44206 1.52020 0.949 0.34320
## seasonNH.15.16 0.10177 0.69512 0.146 0.88365
## seasonNH.16.17 -0.36119 1.77050 -0.204 0.83842
## seasonSH.10 0.18536 1.52535 0.122 0.90332
## seasonSH.11 -1.47592 1.55840 -0.947 0.34397
## seasonSH.12 1.47016 2.31014 0.636 0.52476
## seasonSH.13 1.07195 1.47683 0.726 0.46821
## seasonSH.14 3.38689 1.82408 1.857 0.06382 .
## seasonSH.15 -2.04971 1.96191 -1.045 0.29655
## seasonSH.16 -1.21520 1.09413 -1.111 0.26715
## seasonSH.17 0.51003 2.29359 0.222 0.82410
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 23.33171)
##
## Null deviance: 15523 on 646 degrees of freedom
## Residual deviance: 14349 on 615 degrees of freedom
## AIC: 3907.2
##
## Number of Fisher Scoring iterations: 2
## H1N1 fit to age only, plus fixed effects of vaccination, av use, underlying conditions NO COUNTRY AND SEASON
fit3 = glm(symdur ~ ns(age, knots = 65) + anyvac + anyav + anydx, data = train.H1N1)
summary(fit3)
##
## Call:
## glm(formula = symdur ~ ns(age, knots = 65) + anyvac + anyav +
## anydx, data = train.H1N1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -6.516 -2.903 -1.217 1.358 42.092
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.1430 0.7317 5.662 2.26e-08 ***
## ns(age, knots = 65)1 2.8045 1.3238 2.118 0.0345 *
## ns(age, knots = 65)2 -1.9740 1.1414 -1.729 0.0842 .
## anyvac1 0.2967 0.4550 0.652 0.5145
## anyav1 0.2028 0.5214 0.389 0.6975
## anydx1 0.3367 0.4635 0.727 0.4678
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 23.82754)
##
## Null deviance: 15523 on 646 degrees of freedom
## Residual deviance: 15273 on 641 degrees of freedom
## AIC: 3895.6
##
## Number of Fisher Scoring iterations: 2
## H1N1 fit to age plus imprinting, plus fixed effects of vaccination, av use, underlying conditions, NO COUNTRY AND SEASON
fit4 = glm(symdur ~ ns(age, knots = 65) + p.g1.protection + anyvac + anyav + anydx, data = train.H1N1)
summary(fit4)
##
## Call:
## glm(formula = symdur ~ ns(age, knots = 65) + p.g1.protection +
## anyvac + anyav + anydx, data = train.H1N1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -6.442 -2.857 -1.189 1.325 42.195
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.1516 0.7318 5.673 2.12e-08 ***
## ns(age, knots = 65)1 4.1425 1.8885 2.194 0.0286 *
## ns(age, knots = 65)2 -1.3668 1.2947 -1.056 0.2915
## p.g1.protection -0.7624 0.7674 -0.993 0.3208
## anyvac1 0.3041 0.4550 0.668 0.5042
## anyav1 0.1625 0.5230 0.311 0.7561
## anydx1 0.3669 0.4645 0.790 0.4298
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 23.82802)
##
## Null deviance: 15523 on 646 degrees of freedom
## Residual deviance: 15250 on 640 degrees of freedom
## AIC: 3896.6
##
## Number of Fisher Scoring iterations: 2
## Plot the fits for each model side by side
par(mfrow = c(1, 2), las = 3, mar = c(6, 6, 6, 3), cex.axis = .7)
library(gam)
## Loading required package: foreach
## Loaded gam 1.14-4
plot.gam(fit00, se = T, col = 'dodgerblue', main = 'Simplified Baseline')
plot.gam(fit0, se = T, col = 'dodgerblue', main = 'Baseline')
plot.gam(fit1, se = T, col = 'dodgerblue', main = 'Baseline + age + imp')
plot.gam(fit2, se = T, col = 'dodgerblue', main = 'Baseline + age + imp')
plot.gam(fit3, se = T, col = 'dodgerblue', main = 'Simple baseline + age + imp')
plot.gam(fit4, se = T, col = 'dodgerblue', main = 'Simple baseline + age + imp')
pdf('003Symdur_H1N1.pdf')
plot.gam(fit4, se = T, col = 'dodgerblue', main = 'Simple baseline + age + imp')
dev.off()
## quartz_off_screen
## 2
# Predict probs for each patient
pred00 = predict(fit00, newdata = test.H1N1)
pred0 = predict(fit0, newdata = test.H1N1)
pred1 = predict(fit1, newdata = test.H1N1)
pred2 = predict(fit2, newdata = test.H1N1)
pred3 = predict(fit3, newdata = test.H1N1)
pred4 = predict(fit4, newdata = test.H1N1)
# Estimate the simulated error rate
MSE = numeric(6); names(MSE) = c('simple.baseline', 'baseline', 'baeline+age', 'baseline+age+imp', 'simple.baseline+age', 'simple.baseline+age+imp')
MSE[1] = mean((pred00 - test.H1N1$symdur)^2)
MSE[2] = mean((pred0 - test.H1N1$symdur)^2)
MSE[3] = mean((pred1 - test.H1N1$symdur)^2)
MSE[4] = mean((pred2 - test.H1N1$symdur)^2)
MSE[5] = mean((pred3 - test.H1N1$symdur)^2)
MSE[6] = mean((pred4 - test.H1N1$symdur)^2)
sort(MSE)
## simple.baseline+age+imp simple.baseline+age baseline+age+imp
## 73.82638 74.18532 75.63663
## simple.baseline baeline+age baseline
## 75.66241 76.21245 78.30160
AIC = c(fit00$aic, fit0$aic, fit1$aic, fit2$aic, fit3$aic, fit4$aic); names(AIC) = c('simple.baseline', 'baseline', 'baeline+age', 'baseline+age+imp', 'simple.baseline+age', 'simple.baseline+age+imp')
del.AIC = sort(AIC - min(AIC))
del.AIC
## simple.baseline+age simple.baseline+age+imp simple.baseline
## 0.000000 1.002958 3.887447
## baeline+age baseline+age+imp baseline
## 10.545393 11.603895 15.133104
load('master.AIC.table.H1N1.RData')
master.AIC.table['003Symdur', c('simple.baseline', 'baseline', 'baseline+age', 'baseline+age+imp', 'simple.baseline+age', 'simple.baseline+age+imp')] = AIC - min(AIC)
save(master.AIC.table, file = 'master.AIC.table.H1N1.RData')
rm(master.AIC.table)
## integer(0)
## integer(0)
## integer(0)
## integer(0)
## integer(0)
## integer(0)
## integer(0)
##
## Argentina Australia Belgium China Denmark Germany Greece
## 111 122 3 7 26 16 24
## Peru Singapore Spain Thailand UK USA
## 9 24 28 125 84 168
## [1] 747
library(splines)
## H3N2 fit to medical history only (no effect of country, season, age or imprinting)
fit00 = glm(symdur ~ anyvac + anyav + anydx, data = train.H3N2)
summary(fit00)
##
## Call:
## glm(formula = symdur ~ anyvac + anyav + anydx, data = train.H3N2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -7.832 -2.998 -1.205 0.795 142.495
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.5050 1.0099 7.432 3.29e-13 ***
## anyvac1 1.2075 0.6108 1.977 0.04845 *
## anyav1 -2.6264 0.7837 -3.351 0.00085 ***
## anydx1 0.1191 0.8515 0.140 0.88879
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 58.11606)
##
## Null deviance: 39766 on 672 degrees of freedom
## Residual deviance: 38880 on 669 degrees of freedom
## AIC: 4649.9
##
## Number of Fisher Scoring iterations: 2
## H3N2 fit to country, season and medical history only (no effect of age or imprinting)
fit0 = glm(symdur ~ anyvac + anyav + anydx + country + season, data = train.H3N2)
summary(fit0)
##
## Call:
## glm(formula = symdur ~ anyvac + anyav + anydx + country + season,
## data = train.H3N2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -10.972 -2.873 -0.964 1.036 133.189
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.23197 3.08747 1.695 0.0906 .
## anyvac1 0.70120 0.69945 1.003 0.3165
## anyav1 -1.88909 0.82156 -2.299 0.0218 *
## anydx1 0.04999 0.88978 0.056 0.9552
## countryAustralia 0.44214 1.32185 0.334 0.7381
## countryBelgium -1.83483 4.69204 -0.391 0.6959
## countryDenmark 8.97077 2.14544 4.181 3.3e-05 ***
## countryGreece -1.21917 2.26788 -0.538 0.5911
## countryOther 1.24425 2.01647 0.617 0.5374
## countrySingapore -1.46714 1.83897 -0.798 0.4253
## countrySpain 0.58980 2.16634 0.272 0.7855
## countryThailand -1.49057 1.41048 -1.057 0.2910
## countryUK 0.38486 1.72155 0.224 0.8232
## countryUSA -0.23631 1.63350 -0.145 0.8850
## seasonNH.11.12 2.50392 2.96734 0.844 0.3991
## seasonNH.12.13 2.55262 2.58711 0.987 0.3242
## seasonNH.13.14 1.56102 2.88256 0.542 0.5883
## seasonNH.14.15 2.60802 2.55482 1.021 0.3077
## seasonNH.15.16 2.07245 3.04396 0.681 0.4962
## seasonNH.16.17 0.80441 2.57899 0.312 0.7552
## seasonSH.10 -0.91160 4.56512 -0.200 0.8418
## seasonSH.11 0.19072 3.53065 0.054 0.9569
## seasonSH.12 1.63826 3.11286 0.526 0.5989
## seasonSH.13 -0.62362 3.47057 -0.180 0.8575
## seasonSH.14 2.08340 2.91735 0.714 0.4754
## seasonSH.15 1.51763 2.84487 0.533 0.5939
## seasonSH.16 1.72769 2.84630 0.607 0.5441
## seasonSH.17 2.03845 2.89072 0.705 0.4810
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 56.40278)
##
## Null deviance: 39766 on 672 degrees of freedom
## Residual deviance: 36380 on 645 degrees of freedom
## AIC: 4653.2
##
## Number of Fisher Scoring iterations: 2
## H3N2 fit to age only, plus fixed effects of vaccination, av use, underlying conditions, country and season
fit1 = glm(symdur ~ ns(age, knots = 65) + anyvac + anyav + anydx + country + season, data = train.H3N2)
summary(fit1)
##
## Call:
## glm(formula = symdur ~ ns(age, knots = 65) + anyvac + anyav +
## anydx + country + season, data = train.H3N2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -11.127 -2.866 -1.021 0.989 132.853
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.4989 3.1558 1.742 0.0819 .
## ns(age, knots = 65)1 -1.1374 2.2680 -0.502 0.6162
## ns(age, knots = 65)2 0.9499 1.2462 0.762 0.4462
## anyvac1 0.6418 0.7129 0.900 0.3683
## anyav1 -1.8476 0.8236 -2.243 0.0252 *
## anydx1 0.1335 0.9460 0.141 0.8878
## countryAustralia 0.5395 1.3293 0.406 0.6850
## countryBelgium -1.7323 4.6976 -0.369 0.7124
## countryDenmark 9.0824 2.1525 4.220 2.8e-05 ***
## countryGreece -1.1533 2.2712 -0.508 0.6118
## countryOther 1.2988 2.0194 0.643 0.5203
## countrySingapore -1.2189 1.8646 -0.654 0.5135
## countrySpain 0.6136 2.1689 0.283 0.7773
## countryThailand -1.4218 1.4138 -1.006 0.3150
## countryUK 0.4959 1.7364 0.286 0.7753
## countryUSA -0.1030 1.6534 -0.062 0.9504
## seasonNH.11.12 2.6257 2.9756 0.882 0.3779
## seasonNH.12.13 2.6569 2.5962 1.023 0.3065
## seasonNH.13.14 1.6663 2.8894 0.577 0.5644
## seasonNH.14.15 2.7299 2.5615 1.066 0.2870
## seasonNH.15.16 2.0174 3.0489 0.662 0.5084
## seasonNH.16.17 0.9119 2.5845 0.353 0.7243
## seasonSH.10 -1.0940 4.5814 -0.239 0.8113
## seasonSH.11 0.3042 3.5368 0.086 0.9315
## seasonSH.12 1.7526 3.1206 0.562 0.5746
## seasonSH.13 -0.4184 3.4838 -0.120 0.9044
## seasonSH.14 2.1878 2.9222 0.749 0.4543
## seasonSH.15 1.5589 2.8480 0.547 0.5843
## seasonSH.16 1.7725 2.8510 0.622 0.5343
## seasonSH.17 2.1312 2.8961 0.736 0.4621
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 56.50456)
##
## Null deviance: 39766 on 672 degrees of freedom
## Residual deviance: 36332 on 643 degrees of freedom
## AIC: 4656.3
##
## Number of Fisher Scoring iterations: 2
## H3N2 fit to age plus imprinting, plus fixed effects of vaccination, av use, underlying conditions, country and season
fit2 = glm(symdur ~ ns(age, knots = 65) + p.g2.protection + anyvac + anyav + anydx + country + season, data = train.H3N2)
summary(fit2)
##
## Call:
## glm(formula = symdur ~ ns(age, knots = 65) + p.g2.protection +
## anyvac + anyav + anydx + country + season, data = train.H3N2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -11.248 -2.861 -1.054 1.129 132.569
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.17332 3.50635 2.331 0.0201 *
## ns(age, knots = 65)1 -6.05825 3.62474 -1.671 0.0951 .
## ns(age, knots = 65)2 -0.54897 1.51375 -0.363 0.7170
## p.g2.protection -2.88270 1.65809 -1.739 0.0826 .
## anyvac1 0.59638 0.71223 0.837 0.4027
## anyav1 -1.92775 0.82359 -2.341 0.0196 *
## anydx1 0.16448 0.94467 0.174 0.8618
## countryAustralia 0.59643 1.32758 0.449 0.6534
## countryBelgium -1.16583 4.70157 -0.248 0.8042
## countryDenmark 9.19426 2.15005 4.276 2.19e-05 ***
## countryGreece -1.17415 2.26763 -0.518 0.6048
## countryOther 1.21842 2.01673 0.604 0.5460
## countrySingapore -1.34055 1.86299 -0.720 0.4721
## countrySpain 0.59166 2.16556 0.273 0.7848
## countryThailand -1.49361 1.41214 -1.058 0.2906
## countryUK 0.48342 1.73366 0.279 0.7805
## countryUSA -0.04448 1.65120 -0.027 0.9785
## seasonNH.11.12 2.75155 2.97183 0.926 0.3549
## seasonNH.12.13 2.81201 2.59362 1.084 0.2787
## seasonNH.13.14 1.84509 2.88667 0.639 0.5229
## seasonNH.14.15 2.99777 2.56216 1.170 0.2424
## seasonNH.15.16 2.27336 3.04763 0.746 0.4560
## seasonNH.16.17 1.25487 2.58802 0.485 0.6279
## seasonSH.10 -0.78542 4.57764 -0.172 0.8638
## seasonSH.11 0.74457 3.54034 0.210 0.8335
## seasonSH.12 1.74835 3.11574 0.561 0.5749
## seasonSH.13 -0.25317 3.47966 -0.073 0.9420
## seasonSH.14 2.22679 2.91770 0.763 0.4456
## seasonSH.15 1.64230 2.84395 0.577 0.5638
## seasonSH.16 1.98142 2.84903 0.695 0.4870
## seasonSH.17 2.40937 2.89598 0.832 0.4057
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 56.32738)
##
## Null deviance: 39766 on 672 degrees of freedom
## Residual deviance: 36162 on 642 degrees of freedom
## AIC: 4655.1
##
## Number of Fisher Scoring iterations: 2
## H3N2 fit to age only, plus fixed effects of vaccination, av use, underlying conditions NO COUNTRY AND SEASON
fit3 = glm(symdur ~ ns(age, knots = 65) + anyvac + anyav + anydx, data = train.H3N2)
summary(fit3)
##
## Call:
## glm(formula = symdur ~ ns(age, knots = 65) + anyvac + anyav +
## anydx, data = train.H3N2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -8.176 -3.000 -1.380 0.797 142.251
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.8797 1.2618 6.245 7.56e-10 ***
## ns(age, knots = 65)1 -1.0588 2.2634 -0.468 0.64009
## ns(age, knots = 65)2 0.5504 1.2005 0.458 0.64677
## anyvac1 1.1954 0.6158 1.941 0.05268 .
## anyav1 -2.5994 0.7857 -3.308 0.00099 ***
## anydx1 0.2320 0.9041 0.257 0.79751
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 58.25312)
##
## Null deviance: 39766 on 672 degrees of freedom
## Residual deviance: 38855 on 667 degrees of freedom
## AIC: 4653.5
##
## Number of Fisher Scoring iterations: 2
## H3N2 fit to age plus imprinting, plus fixed effects of vaccination, av use, underlying conditions, NO COUNTRY AND SEASON
fit4 = glm(symdur ~ ns(age, knots = 65) + p.g2.protection + anyvac + anyav + anydx, data = train.H3N2)
summary(fit4)
##
## Call:
## glm(formula = symdur ~ ns(age, knots = 65) + p.g2.protection +
## anyvac + anyav + anydx, data = train.H3N2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -8.326 -3.052 -1.301 0.813 142.187
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.0375 2.0593 4.874 1.37e-06 ***
## ns(age, knots = 65)1 -4.7924 3.6128 -1.327 0.185124
## ns(age, knots = 65)2 -0.6109 1.4857 -0.411 0.681062
## p.g2.protection -2.1789 1.6439 -1.325 0.185486
## anyvac1 1.1802 0.6156 1.917 0.055646 .
## anyav1 -2.6417 0.7859 -3.361 0.000821 ***
## anydx1 0.2690 0.9040 0.298 0.766161
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 58.1871)
##
## Null deviance: 39766 on 672 degrees of freedom
## Residual deviance: 38753 on 666 degrees of freedom
## AIC: 4653.7
##
## Number of Fisher Scoring iterations: 2
## Plot the fits for each model side by side
par(mfrow = c(1, 2), las = 3, mar = c(6, 6, 6, 3), cex.axis = .7)
library(gam)
plot.gam(fit00, se = T, col = 'firebrick1', main = 'Simplified Baseline')
plot.gam(fit0, se = T, col = 'firebrick1', main = 'Baseline')
plot.gam(fit1, se = T, col = 'firebrick1', main = 'Baseline + age + imp')
plot.gam(fit2, se = T, col = 'firebrick1', main = 'Baseline + age + imp')
plot.gam(fit3, se = T, col = 'firebrick1', main = 'Simple baseline + age + imp')
plot.gam(fit4, se = T, col = 'firebrick1', main = 'Simple baseline + age + imp')
pdf('003Symdur_H3N2.pdf')
plot.gam(fit4, se = T, col = 'firebrick1', main = 'Simple baseline + age + imp')
dev.off()
## quartz_off_screen
## 2
# Predict probs for each patient
pred00 = predict(fit00, newdata = test.H3N2)
pred0 = predict(fit0, newdata = test.H3N2)
pred1 = predict(fit1, newdata = test.H3N2)
pred2 = predict(fit2, newdata = test.H3N2)
pred3 = predict(fit3, newdata = test.H3N2)
pred4 = predict(fit4, newdata = test.H3N2)
# Estimate the simulated error rate
MSE = numeric(6); names(MSE) = c('simple.baseline', 'baseline', 'baeline+age', 'baseline+age+imp', 'simple.baseline+age', 'simple.baseline+age+imp')
MSE[1] = mean((pred00 - test.H3N2$symdur)^2)
MSE[2] = mean((pred0 - test.H3N2$symdur)^2)
MSE[3] = mean((pred1 - test.H3N2$symdur)^2)
MSE[4] = mean((pred2 - test.H3N2$symdur)^2)
MSE[5] = mean((pred3 - test.H3N2$symdur)^2)
MSE[6] = mean((pred4 - test.H3N2$symdur)^2)
sort(MSE)
## simple.baseline simple.baseline+age simple.baseline+age+imp
## 20.47214 20.85406 20.88790
## baseline baeline+age baseline+age+imp
## 22.51058 22.85079 23.08748
AIC = c(fit00$aic, fit0$aic, fit1$aic, fit2$aic, fit3$aic, fit4$aic); names(AIC) = c('simple.baseline', 'baseline', 'baseline+age', 'baseline+age+imp', 'simple.baseline+age', 'simple.baseline+age+imp')
del.AIC = sort(AIC - min(AIC))
del.AIC
## simple.baseline baseline simple.baseline+age
## 0.000000 3.274308 3.570356
## simple.baseline+age+imp baseline+age+imp baseline+age
## 3.797487 5.236422 6.397545
load('master.AIC.table.H3N2.RData')
master.AIC.table['003Symdur', c('simple.baseline', 'baseline', 'baseline+age', 'baseline+age+imp', 'simple.baseline+age', 'simple.baseline+age+imp')] = AIC - min(AIC)
save(master.AIC.table, file = 'master.AIC.table.H3N2.RData')
rm(master.AIC.table)